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HETERODUPLEX TRACKING ASSAY (HTA) FOR GENOTYPING HCV 



Field of the Invention 

This invention relates to genotyping hepatitis C viruses (HCV). In particular, this 
invention relates to specific primers preferably from the core and envelope region of 
HCV and a method to determine genotypes of HCV with a heteroduplex mobility or 
tracking assay which, in turn, utilizes specific primers. 

Background of the Invention 

Viral hepatitis is known to be caused by five different viruses known as hepatitis 
A, B, C, D, and E. HAV is an RNA virus and does not lead to long-term clinical 
symptoms. HB V is a DNA virus. HDV is a dependent virus that is unable to infect cells 
in the absence of HBV. HEV is a water-borne virus. HCV was first identified and 
characterized as a cause of non-A, non-B hepatitis NANBH. (Houghton et ah, EPO Pub. 
Nos. 388,232 and 318,216). This led to the disclosure of a number of general and 
specific polypeptides useful as immunological reagents in identifying HCV. See, e.g., 
Choo et al. (1989) Science . 244:359-262; Kuo et al., (1989) Science 244:362-364 and 
Houghton et al, ( 1 99 1 ) Hepatoloev 14:381-388: 

HCV is a single stranded RNA virus, distantly related to the pestivirus and 
flavivirus and it is the causative agent of the vast majority of transfusion-associated 
hepatitis and of most cases of community-acquired non-A, non-B hepatitis worldwide. 
The HCV genome consists of 5' and 3' noncoding (NC) regions that flank a single long 
open reading frame (ORF). This ORF encodes for three structural proteins at the amino- 
terminal end and for six nonstructural (NS) proteins at the carboxyi-terminal end. The 
structural proteins are represented from the nucleocapsid (core; C) proteins and two 
glycoproteins, envelope 1 (El) and enve lope 2 (E2). The nonstructural proteins are 
named NS2, NS3, NS4a, NS4b, NS5a, NS5b. The 5'NCR is the most highly conserved 
pait-of the-HGV-genome-whereas-the-sequenGe-of^the-two envelope-proteins-(El-and,E2) 
is highly variable among different HCV isolates. The highest degree of variation has 
been observed in a region within E2, now commonly termed hypervariable region 1 
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(HVR1) or E2HV. A second variable region called the HVR2 also exists in a subset of 
isolates. Typically, the genetic heterogeneity of HCV has been classified under two 
headings quasispecies and genotypes. As used herein the term "quasispecies" refers to 
the genetic heterogeneity of the HCV population within an infected individual. As used 
herein the terms "genotype" and "subtype 11 refer to the genome heterogeneity observed 
among different HCV isolates. The analysis of nucleic acid sequence variation of the 
HCV genome, a positive stranded of approximately 9.4 kb RNA molecule, suggest that 
genetic variability is associated with important virological and clinical implications. 

The prototype isolate of HCV was characterized in EP Publications Nos. 318,216 
and 388,232. As used herein, the term "HCV" includes newly isolated NANBH viral 
species. The term "HCV-l" refers to the virus described in the abtfve-mentioned 
publications. 

Since the initial identification of HCV, at least 6 different major viral types have 
been identified (full length genomes reported) and designated Type 1, 2, 3, 4, 5 and 6. 
Within these types are numerous subtypes. The type of virus with which a patient is 
infected may affect the clinical prognosis and also response to various treatments. See, 
Yoshioke et al., (1992) Heoatology 16:293-299. Considering that the most serious 
clinical outcome of HCV infection is heptocellular carcinoma, it would be useful to be 
able to determine with which type or types of HCV a patient is infected. It is thus of 
particular importance to develop an, accurate, reliable assay for HCV genotyping and 
subtyping, that, without requiring the sequencing, could also give the genetic divergence 
intra-subtype. Several classification have been proposed for HCV genotyping based on 
analysis of different regions, because the ideal nucleotide sequence-based system, using 
the complete viral genome is not practical. 

Summary of the Invention 

The present invention includes primers and methods for the characterization of 
HCV genotyping and of variation intra-subtype based on the heteroduplex tracking assay 
(HTA). The preferred probes/primers were single stranded derived from the carboxyl 
terminus of core and part of the El region of HCV. 
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The HTA is a hybridization based method of determining the genetic relationship 

between two-Qr.more viral g enomes. The basis of the method is that related DNA 

products coamplified from divergent templates reanneal randomly to form 
heteroduplexes that migrate with reduced mobility in systems designed to 
5 separatemolecultes or the basis of size such as neutral polyacryiamide gels, HTA was 
originally used to genotype HIV-1 and to follow the in vivo evolution of HIV- 1 in 
patients and populations. See, e.g., Del wart et al., (1993) Science 262:1757-1261 and 
Delwart et al., (1994) J. Virol . 68:6772-6883. 

One aspect of the invention is a method for genotyping HCV comprising the 
1 0 steps of denaturing and reannealing partially complementary DNA or RNA strands and 
detecting sequence variation by noting electrophoretic mobility of the DNA 
heteroduplexes on a system designed to separate moleculte on the basis of size such as by 
following electrphoresis through a polyacryiamide or MDE gel. 

Another aspect of the invention relates to the probes used in the genotyping 
15 which were selected from the core and El region of the HCV genome. 

Another aspect of the invention relates to a method of predicting the response to 
drug therapy of a patient infected with a strain of HCV by determining the sensitivity of 
different known genotypes to drug therapy, determining the genotype of the HCV strain 
infecting the patient and comparing the genotype with its drug therapy sensitivity to 
20 predict the patient's response to the drug therapy. 

Another aspect of the invention relates to therapeutic vaccines and predicting 
which therapeutic vaccine should be utilized by determining the genotype of a patient 
infected with a strain of HCV and administering a therapeutic vaccine of the same 
genotype. 

25 Another aspect of the invention relates to prophylatic vaccines and predicting 

which vaccine should be administered to a certain population sample by determining the 
prevalent genotypes in a like sample and administering a prophylatic vaccines of a 

genotype likely to be the preva lent genotype to the population sample. 

Another aspect of the invention relates to the ability to discovering new 

30 genotypes of HCV using the method of the invention. 
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Brief Description of the Figures 

Figures 1 A - IE are autoradiograms showing homoduplexes and heteroduplexes 
of the samples to be typed with the probes of known genotypes (ss probes are of 
genotypes la, lb, 2a, 2b, 3a in Figs. 1 A- IE respectively, lane on far left of MDE gel). 
The homoduplex (h) (ss probe to the double stranded RT-PCR product of known 
genotypetpfrom which it was derived) is shown adjacent to the probe. The 
heteroduplexes of the RT-PCR products from the 15 dialysis patients (nos. 1, 2, 3, 4, 7, 
18, 20, 22, 23, 24, 26, 28, 30, 33, 35) hybridized to the ss probe is designated above the 
appropriate lane in each Figure. 

Figures 2A - 2C are dendograms, i.e., phylogenetic trees showing the relatedness 
of each partial El nucleotide sequence,, formed by comparing partial El sequences 
obtained by sequencing of putative type 1 (nt 625-930), type 2 (nt 583-915) or type 3 (nt 
558-834) isolates from the dialysis patients described hereinto published genotype 
sequences for type la (HCV-1) (Choo, et al, PNAS (1991) 88:2451-2455, all nucleotide, 
"nt", designations according to this paper), lb (HCV-J) (Kato et al, PNAS (1990) 
87:9524-2528), 2a (HC-J6) (Okamoto et al Virol. (1992) 188:331-341), 2b (HC-J8) 
(Okamoto et al Virol. (1992) 188:331-341), 2c (Bukh, et al PNAS (1993) 90:8234-8239 ) 
and 3a (NZL-1) (Sakamoto, et al) J. Gen. Virol. (1994) 75:1761-1768 over the same 
region of the genome. 

Figure 2D is a dendogram, phylogenetic tree, formed by comparing either partial 
5 f UTR sequences of isolates 23, 30 and 33 obtained by direct sequencingwith published 
type 1 , 2 and 3 (nt -274 to -81) genotype sequences for the same region of the genome. 

Figures 3 A - 3D show the nucleotide sequences for dendograms depicted in 
Figures 2A - 2D. 

Description of the Invention 

The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of molecular biology, microbiology, recombinant DNA, 
polypeptide and nucleic acid synthesis, and immunology, which are within the skill of 
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the art. Such techniques are explained fully in the literature. See e.g., Sambrook, et al., 

MOLEeUtAR-eLONm 

(1989); DNA CLONING, VOLUMES I AND II (D.N Glov r ed. 1985); 
OLIGONUCLEOTIDE SYNTHESIS (MJ. Gait ed, 1984); NUCLEIC ACID 
5 HYBRIDIZATION (B.D. Hames & SJ. Higgins eds. 1 984); TRANSCRIPTION AND 
TRANSLATION (B.D. Hames & SJ. Higgins eds. 1984); ANIMAL CELL CULTURE 
(R.I. Freshney ed. 1986); IMMOBILIZED CELLS AND ENZYMES (1RL Press, 1986); 
B. Perbal, A PRACTICAL GUIDE TO MOLECULAR CLONING (1984); the series, 
METHODS IN ENZYMOLOGY (Academic Press, Inc.); GENE TRANSFER 
10 VECTORS FOR MAMMALIAN CELLS (J.H. Miller and M.P. Calos eds. 1 987, Cold 
Spring Harbor Laboratory), Methods in Enzymology Vol. 154 and* Vol. 155 (Wu and 
Grossman, and Wu, eds., respectively), Mayer and Walker, eds. (1987), 
IMMUNOCHEMICAL METHODS IN CELL AND MOLECULAR BIOLOGY 
(Academic Press, London), Scopes, (1987), PROTEIN PURIFICATION: PRINCIPLES 
1 5 AND PRACTICE, Second Edition (Springer-Verlag, N. Y.),and HANDBOOK OF 

EXPERIMENTAL IMMUNOLOGY, VOLUMES I-IV (D.M. Weir and C. C. Blackwell 
eds 1986). 

Standard abbreviations for nucleotides and amino acids are used in this 
specification. All publications, patents, and patent applications cited herein are 
20 incorporated by reference. 

The term "recombinant polynucleotide" as used herein intends a polynucleotide 
of genomic, cDNA, semisynthetic, or synthetic origin which, by virtue of its origin or 
manipulation: 

(1) is not associated with all or a portion of a polynucleotide with which it is 
25 associated in nature, (2) is linked to a polynucleotide other than that to which it is linked 
in nature, or(3) does not occur in nature. 

The term "polynucleotide" as used herein refers to a polymeric form of 

nucleotides_of any length, either ribonucleotides or deoxyribonucleotides. This term 

refers only to the primary structure of the molecule. Thus, this term includes double- and 
30 single-stranded DNA and RNA. It also includes known types of modifications, for 
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example, labels which are known in the art, methylation, "caps", substitution of one or 
more of the naturally occurring nucleotides with an analog, intemucleotide modifications 
such as, for example, those with uncharged linkages (e.g., methyl phosphorates, 
phosphotriesters, phosphoamidates, carbamates, etc.) and with charged linkages (e.g., 
phosphorothioates, phosphorodithioates, etc.), those containing pendant moieties, such 
as, for example proteins (including for e.g., nucleases, toxins, antibodies, signal peptides, 
poly-L-lysine, etc.), those with intercalators (e.g., acridine, psoralen, etc.), those 
containing chelators (e.g., metals, radioactive metals, boron, oxidative metals, etc.), those 
containing alkylators, those with modified linkages (e.g., alpha anomeric nucleic acids, 
etc.), as well as unmodified forms of the polynucleotide. 

By "PCR" is meant herein the polymerase chain reaction (PCR) technique, 
disclosed by Mullis in U.S. Pat. Nos. 4,683,195 (Mullis et al) and 4,683,202, 
incorporated herein by reference. In the PCR technique, short oligonucleotide primers 
are prepared which match opposite ends of a desired sequence. The sequence between 
the primers need not be known. A sample of DN A (or RN A) is extracted and denatured 
(preferably by heat). Then, oligonucleotide primers are added in molar excess, along 
with dNTPs and a polymerase (preferably Taq polymerase, which is stable to heat). The 
DNA is replicated, then again denatured. This results in two "long products," which 
begin with the respective primers, and the two original strands (per duplex DNA 
molecule). The reaction mixture is then returned to polymerizing conditions (e.g., by 
lowering the temperature, inactivating a denaturing agent, or adding more polymerase), 
and a second cycle initiated. The second cycle provides the two original strands, the two 
long products from cycle 1, two new long products (replicated from the original strands), 
and two "short products" replicated from the long products. The short products have the 
sequence of the target sequence (sense or antisense) with a primer at each end. On each 
additional cycle, an additional two long products are produced, and a number of short 
products equal to the number of long and short products remaining at the end of the 
previous cycle. Thus, the number of short products grows exponentially with each cycle. 
This amplification of a specific analyte sequence allows the detection of extremely small 
quantities of DNA. 
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The term "3SR" as used herein refers to a method of target nucleic acid 

amplification also known as the— self-sustained sequence replication— system as 

described in European Patent Publication No. 373,960 (published June 20, 1990). 
The term "LCR" as used herein refers to a method of target nucleic acid 
5 amplification also known as the M ligase chain reaction" as described by Barany. Proa 
Natl. Acad. Sci. (USA) (1991) 88:189-193. 

An "open reading frame" (ORF) is a region of a polynucleotide sequence which 
encodes a polypeptide; this region may represent a portion of a coding sequence or a total 
coding sequence. 

1 0 A "coding sequence" is a polynucleotide sequence which is translated into a 

polypeptide, usually via mRNA, when placed under the control of appropriate regulatory 
sequences. The boundaries of the coding sequence are determined by a translation start 
codon at the 5'-terminusand a translation stop codon at the 3 -terminus. A coding sequence 
can include, but is not limited to, cDNA, and recombinant polynucleotide sequences. 

15 As used herein, the term "polypeptide" refers to a polymer of amino acids and does 

not refer to a specific length of the product; thus, peptides, oligopeptides, and proteins are 
included within the definition of polypeptide. This term also does not refer to or exclude 
post expression modifications of the polypeptide, for example, glycosylations, acety lations, 
phosphorylations and the like. Included within the definition are, for example, 

20 polypeptides containing one or more analogs of an amino acid (including, for example, 
unnatural amino acids, etc.), polypeptides with substituted linkages, as well as other 
modifications known in the art, both naturally occurring and non-naturally occurring. 

A polypeptide or amino acid sequence "derived from" a designated nucleic acid 
sequence refers to a polypeptide having an amino acid sequence identical to that of a 

25 polypeptide encoded in the sequence, or a portion thereof wherein the portion consists of at 
least 3-5 amino acids, and more preferably at least 8-10 amino acids, and even more 
preferably at least 11-15 amino acids, or which is immunologically identifiable with a 

polypeptide encoded in the sequence. This terminolo g y also includes a pol y peptide 

expressed from a designated nucleic acid sequence. 
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The protein may be used for producing antibodies, either monoclonal or polyclonal, 
specific to the protein. The methods for producing these antibodies are known in the art. 

"Recombinanthost cells", "host cells," "cells," "cell cultures," and other such terms 
denote, for example, microorganisms, insect cells, and mammalian cells, that can be, or 
have been, used as recipients for recombinant vector or other transfer DN A, and include the 
progeny of the original cell which has been transformed. It is understood that the progeny 
of a single parental cell may not necessarily be completely identical in morphology or in 
genomic or total DN A complement as the original parent, due to natural, accidental, or 
deliberate mutation. Examples for mammalian host cells include Chinese hamster ovary 
(CHO) and monkey kidney (COS) cells. 

By "cDNA" is meant a complimentary mRNA sequence thatliybridizes to a 
complimentary strand of mRNA. 

By "purified" and "isolated" is meant, when referring to a polypeptide or nucleotide 
sequence, that the indicated molecule is present in the substantial absence of other 
biological macromolecules of the same type. The term "purified" as used herein preferably 
means at least 75% by weight, more preferably at least 85% by weight, more preferably 
still at least 95% by weight, and most preferably at least 98% by weight, of biological 
macromoleculesof the same type present (but water, buffers, and other small molecules, 
especially molecules having a molecular weight of less than 1 000, can be present). 

By "pharmaceutical acceptable carrier," is meant any pharmaceutical carrier that 
does not itself induce the production of antibodies harmful to the individual receiving the 
composition. Suitable carriers are typically large, slowly metabolized macromolecules 
such as proteins, polysaccharides, poly lactic acids, polyglycolic acids, polymeric amino 
acids, amino acid copolymers; and inactive virus particles. Such carriers are well known to 
those of ordinary skill in the art. 

The therapeutic compositions typically will contain.pharmaceutically acceptable 
vehicles, such as water, saline, glycerol, ethanol, etc. Additionally, auxiliary substances, 
such as wetting or emulsifying agents, pH buffering substances, and the like, may be 
present in such vehicles. 
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Typically, the therapeutic compositions are prepared as injectables, either as liquid 

solutions or suspensions; solid forms suitable for solution in, or suspension in, liquid 

vehicles prior to injection may also be prepared. The preparation also may be emulsified or 
encapsulated in liposomes for enhanced adjuvant effect 
5 Evidence indicates that different HCV geneotypes may have different 

pathogenicities as well as distinct geographical distributions and may elicit partly 
different serological profiles in infected patients. See Cammarota, et al. J. Clin. Microb . 
( 1 995) 33 :278 1 -2784. The invention includes methods for detecting HCV and 
identifying infection by different types of HCV. The invention includes genotyping 
10 HCV, the potential to discover a new genotype of HCV, and assessing viral populations 
for ability to predict response to drug therapy. The invention also includes probes for use 
in the genotyping of HCV. 

The methods for genotyping HCV include but are not limited to a heteroduplex 
tracking or mobility assay utilizing probes/primers from the core/El region of the HCV 
1 5 genome. The documented antigenic differences between HCV genotypes would have 
usefulness not only in blood donor screening and in predicting response to IFN 
treatment, but also for the designated composition of candidate vaccines for HCV in 
different countries, choice of therapeutic vaccines, as well as in the identificaiton of new 
genotypes. Other methods have been proposed to identify the main genotypes infecting 
20 populations, based on analysis of different regions of the genome, such as RFLP. See 
Davidson et al., J. Gen Virol. (1 995) 76: 1 197-1204 for discussion of genotyping HCV 
using RFCP of sequences camplified form the 5' non-coding region (NCR). 

The known nucleic acid based methods of genotyping require a sub-type specific 
RT-PCR(reverse transcriptase-PCR) primers (see Okamoto (1992) J. Gen Virol 73:673- 
25 679) U.S. Patent 5,427,909; (2) specific probes (G. Marteen, et ah. Line probe assay); (3) 
restriction site polymorphism (a function of the nucleotide sequence (nt)) or (4) direct 
sequences to determine genotype. The analysis of the 5 1 NC sequence with RFLP is easy 
to perform, but does not accurately predict all HCV genotypes, and, some subtypes may 
be misclassified. For example, the change in sequence between la and 1 b recognized by 
30 the restriction enzyme is not absolute and sequences other than 1 a and 1 b, and 2a and 2b 



WO 97/40190 



PCTAJS97/06062 



to 

are misclassified. For example, type 1c would appear as type la, type 2c as either type 
2a or 2b. See Cammarota et al.. J. Clin. Microbe (1995) 33:278 1-2784. For this reason, 
RFLP is not able to detect "escape" species, new divergent species, or epidemiological 
trends. It is likely that a typing method like RFLP will have to be continuously modified 
to accommodate the rapidly increasing information collected on HCV sequence 
heterogeneity. 

As above-mentioned, when using the nucleic acid based methods of genotyping, 
one obtains a result of either a type or subtype or a negative that is "untypeable" result. 
See, e.g. Cammarota, et al., J. Clin. Microb. (1995) 33:2781-2784, isolates that remained 
untyped by genotype-specific PCR were classified subtype 2c on the basis of sequence 
analysis of PCR amplications obtained from the core and NS5 genfes. This problem is 
avoided by using the presently claimed invention to determine HCV genotypes by 
choosing RT-PCR primers in the C-terminus or core/mid 2/3 of El . In addition, the 
subtype of the isolate can be accurately determined using the. present invention of HCV 
genotyping and isolates can be detected, even those less than in approximately 30% 
divergent, enabling the characterization of new sub-types without sequencing. 

Heteroduolex Tracking or Mobility Assay 

The method of determining the genotype of HCV in the present invention utilizes 
minor variants in complex quasispecies. One such technique is the heteroduplex tracking 
assay (HTA). HTA, well known in the art for use with HIV, (see e.g., Delwart,et al., J. 
Virol. (1994) 68:6672-6683; Delwart, et al., Science (1993) 262:1257-1261; Delwart, et 
aL, PCR Methods and Applications 4:S202-S216 (19950 Cold Springs Harbor; and 
Delwart, et al., Heteroduplex Mobility Analysis HIV-1 env Subtyping Kit Protocol 
Version 3, each q£ which is incorporated herein by reference in its entirety), grew out of 
the observation that when sequences were amplified by nested PCR from peripheral 
blood mononuclear cells of infected individuals, related DNA products coamplified from 
divergent templates could randomly reanneal to form heteroduplexes that migrate with 
reduced mobility in neutral polyacrylamide gels. Using these techniques, one can 
establish genetic relationships between multiple viral DNA template molecules. 
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HTA in particular utilizes a first PCR product as a labeled probe, it may be 

radioactive, which is mixed with an excess (driver) of a n unlabeled PC R product from a 

different source, i.e., the source for which typing is desired. The probe sequences are 
then driven completely into heteroduplexes with the driver, and are separated on the basis 
5 of size. An autoradiogram for example of the resulting polyacrylamide gel reveals only 
these heteroduplexes and provides a visual display of the relationship between the two 
virus populations under study. The fact that heteroduplexes migrate with distinct 
mobilities indicates that the strand-specific composition of mismatched and unpaired 
nucleotides affects their mobility. 
10 An exponential equation described in Delwart et al., is then used to describe a 

curve fitting the experimental data from pairwise analysis of genes 'of known sequence. 
In the present invention, the equation is used to estimate the genetic distance between the 
known genotypes of the probes and the unknown genotypes of the patient samples. 

15 Primers for Use in the HTA 

It was determined that the El or core region could be the best region in to study 
the HCV heterogeneity, thus the El region became the choice for primers in the present 
invention. The use of the partial El sequence, the most heterogeneous region of the 
genome for the present invention, as well as a longer fragment, i.e. 400nt, althought it 

20 could have been as long as 1000m, enabled the design of probes which do not cross 

hybridize among sub-types/types and thus allow accurate geneotyping. By flanking the 
heterogeneous region, conserved nt sequences for sense and antisense primers were 
identified. Preferably, a combination of universal sense and type specific antisense 
primers for the first PCR round and a universal antisense and type specific sense primers 

25 for the second round were utilized. The PCR need not be two rounds and the primers are 
not limited to the above-described combination. The preferred combination, however, 
enabled the preparation ot single stranded probes and minimized the number ot PCR 
primer combinations. 

Preferred probes are sequences in the core and El regions of which the 

30 sequences for a wide range of genotypes are published and grouped into at least 12 
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distinct genotypes and subtype: I/la, H/lb, III/2a, IV/2b, 2c, 3a, 4a, 4b, 4c, 4d,5a, 6a. 
The nucleotide sequence identities of the El gene among HCV isolates of the same 
genotype ranges from 88.0% to 99.1%, whereas those of HCV isolates of different 
genotypes ranges from 53.5 to 78.6%. The degree of variation for good discrimination of 
heteroduplex in neutral polyacrylamide gels is comfortably within the range of 3-20%, 
so that is likely that divergent templates reanneal to form a heteroduplex if they are of the 
same subtype. For this reason, a single stranded 32p labelled DNA probe was used so 
that if the formation of the heteroduplex is impossible, the ss-DNA probe could likely 
not reanneal and form a homoduplex band. Without direct sequencing, the present 
invention can rapidly give not only a certainly identification of the subtypes, but also the 
genetic relations inside the same subtypes. For example, the genotypes analyzed, i.e., 
(la, lb, 2a, 2b, 3a) showed no overlapping between different subtypes. 

Further since isolates approximately 30% divergent can be visualized on the gel- 
new subtypes can be visualized and the distribution of isolates in a population could be 
characterized and populations or individual isolates can be followed in population or in 
individuals in epidemiological studies. 

HCV Genotvping Kits 

A kit for determining the genotype of HCV is within the scope of this invention. 
As described for HIV in Delwart et al, Heteroduplex Mobility Analysis HIV-1 env 
Subtyping Kit Protocol Version 3, such a kit would include the specific primers. 
Preferred primers are from the core and El region of the HCV genome. If two stages of 
PCR are desired, the first round primers could include for example a universal sense 
probe, preferably located in the core/El region of the HCV genome. One such universal 
primer is located from nucleotide 508 to 529 of HCV-1 and is shown in Table 1 . 
Coupled with the universal primer could be a type specific antiserise primer also 
preferably located in the core/El region of the HCV genome. Examples of these primers 
are from nucleotides 1 032 to 1012 for type 1 , type 2a, type 2b and type 3a of the HCV 
genomes and are also shown in Table 1 . 
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If a second round of PCR is desired, the second round primers would likewise be 

fro m the core/El re g ion of the HCV g enome. Preferred second round primers could 

include a universal antisense primer from nucleotides 978 to 958 of the HCV-1 genome, 
this primer is shown in Table 1. In addition the second round primers could include a 
5 type specific sense primer from the core/El region. Preferred second round type specific 
sense primers are from nucleotides 536 to 557 of HCV genomes type 1, type 2 or type 3, 
and are shown in Table 1 . 

The first or second round of primers may be sufficient to amplify the viral RNA 
without using a second round of PCR if the concentration of the virus is sufficiently high, 
10 ie„ nested PCR is not necessarily required, what is required is PCR products in lOOx 
excess of probe. 

An HCV genotyping kit of the present invention would also include subtype 
references which may change as new subtypes are discovered and evaluated for use in 
the kit. Use of more than one reference from a given subtype is recommended because 
1 5 comparison to a single reference does not always provide an unambiguous result. 

The foregoing discussion and following examples only illustrate the invention, 
persons of ordinary skill in the art will appreciate that the invention can be implemented 
in other ways, and the invention is defined solely by reference to the claims. 

20 Example 1 

Patient samples 

35 hemodialyzed patients undergoing regular hemodialysis were studied: 20 men 
(57%) and 1 5 women (43%) with a mean age of 64.8 + 13 years. Serum samples were 
collected in August 1995, divided into aliquots and stored at -80 degrees Celsius. 26 
25 patients were anti-HCV ELISA positive and 9 anti-HCV ELISA negative. 25 of the 26 
ELISA positive were also RIBA III positive, while 1 was indeterminate. The 9 ELISA 
negative were all RIbA 111 negative. 15 patients were HCV-RNA 5* UTR and El PCR 
positive. By direct sequencing of 15 5' NCR products, 5 patients resulted type 1 ; 3 
patients type 2; and 7 patients type 3. 
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Example 2 
cDNA and PCR 

HCV-RNA was extracted at least two different times using a Stratagene reagent 
from a Strategene RNA Isolation Kit (Chomezynsky and Sacchi method). 

RNA extracted from 20ul of plasma that was reverse transcribed in a 25 ul of 
cDNA mixture (BRL cDNA synthesis kit, 8085SB) using 100 pmoi of PCR primers. 
The cDNA mixture was boiled for 5 minutes, quick-cooled on ice and added to the PCR 
cDNA reagents with final concentrations according to the Perkin PCR kit (N80 1-0055) 
specification. 40 PCR cycles (94 degrees Celsius for 10 seconds, 55 degrees Celsius for 
30 seconds and 72 degrees Celsius for 30 seconds were performed. Ten ul of the first 
PCR reaction mixture was added to a second PCR reaction mixture containing nested 
PCR primers and was amplified for 40 cycles as indicated above. 

The first extraction was used for the nested-PCR reaction with primers specific 
for the 5' NCR as previously described in Shimizu et al, PNAS (1992) 5477-5481 and 
this product was directly sequenced and used for the RFLP. RNA from the same 
extraction was used for HTA using core/El primers. A second RNA extraction was 
performed for RFLP and/or HTA to confirm the results. The primers used for the HTA 
are listed in Table 1 . The nested pairs of PCR primers used to obtain these El products 
were different for the types 1, 2a, 2b, and 3a. The universal sense probe for the first 
round of amplification corresponds to 5'-3' nt 508-529, amino acids 170-176, of Choo, et 
al., PNAS, 1991, while the universal antisense primer for the second round of 
amplification corresponds to nt 978-958, amino acids 320-326 of Choo, et al., PNAS, 
1991. 

When the ssDNA DNA probes were prepared for use in the HTA, one of the 
primers for the nested PCR was biotinylated. See e.g. SEQ ID NO:6 in Table 1 . 
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Tabic 1 



HCV-l 5* -* V nt 3' -> 5' nt ~ Amino AcicJ Pfimcr Type 



5 





(SEQIDN0:1) 
Purified, C170S 


508-529 


529-508 


170-176 


Universal sense probe PCR I 


iO 


(SEQ ID N0:2) 
Purified, E338A1 


1032-1012 


1012-1032 


338-344 


Type 1 antisense PCR I 




(SEQ ID NO:3) 
Purified, E338A2a 


1032-1012 


1012-1032 


338-344 


Type 2a antisense PCR I 


15 


(SEQ IDNO:4) 
Purified, E338A2b 


1032-1012 


1012-1032 


338-344 


Type 2b antisense PCR I 


20 


(SEQ ID NO:5) 
Purified, E338A3a 

(SEQ IDNO:6) 
Purified, E320A 


1032-1012 
978-958 


1012-1032 
958-978 


338-344 
320-326 


Type 3a antisense PCR 1 
"Universal antisense PCR II 


25 


(SEQ IDN0:7) 
Purified, CI79S1 


536-557 


958-978 


179-186 


Type 1 sense PCR 11 




(SEQ ID NO:8) 
Purified, CI 79S2 


536-557 


958-978 


179-186 


Type 2 sense PCR II 


30 


(SEQ IDNO:9) 
Purified, C179S3 


536-557 


958-978 


179-186 


Type 3 sense PCR II 



Example 3 

35 HTA 

The single stranded probes were prepared by RT-PCR of HCV ELISA and RIB A 
positive sera of known genotypes with the same PCR primers described, as above, except 
that one of the primers 3 20 A was biotinylated. ssDNA probes were generated with the 
Dynabeads M-280 Streptavidin following the protocol of Heng Pan and Eric Delwart. 

40 The non-biotinyl single strand was eluted was from the magnetic bead/streptavidin 

column. Probes were generated from 20 ng of ssDNA of the different genotypes and end 
labeled using T4 polynucleotide kinase (Gibco BRL) and 100 microCi of 32P ATP and 

then column purified. The kinase probe was separated from 32P ATP using a Pharmacia 

Bio Sepharose column. The 32P-labeled single strand probes were mixed with a 100- 

45 fold-excess-driver^and tiie-^ 

control serum/plasma. Hybridization was in 2 x SSC. The mixtures were put on a 94 
degree Celsius heat block for 3 minutes. They were then transferred to a 55 degree 



WO 97/40190 



PCT/US97/06062 



Celsius heat block for at least 2 hours. The entire reaction volume was loaded on 1mm 
thick, 6% polyacrylamide MDE gel (Baker) and electrophoresed for 16 h at 500V. The 
gel was vacuum dried at 80 degrees Celsius on filter paper and exposed to X-ray film. 
The genotypes of each of the samples were determined based on the Delwart method. 
Table 2 depicts the genotype results determined by using HTA. 

Figures 1 A-1E are antoradiograms showing each of the single strand probes in 
Table 1, that is the probes specific known for genotypes la, lb, 2a, 2b, 3a in Figures 1 A- 
E respectively, see the lane on the far left of the MDE gel. The homoduplex(h) (ss probe 
to the double stranded RT-PCR product form which it was derived) is shown adjacent to 
the probe. RT-PCR products from the 1 5 dialysis patients (nos. 1 , 2, 3, 4, 7, 1 8, 20, 22, 
23, 24, 26, 28, 30, 33, 35) hybridized to the probe is designated also as the appropriate 
lane in each Figure. 

As can be seen in Figures 1 A- IE, Type 1 ss subtypes probes were specific for 
each type 1 sub-type and did not cross hybridize with other subtypes lb, 2a, 2b, 3a (2a, 
2b not shown). Type 3a ss sub- type specific probe was also specific for subtype 3a and 
did not cross hybridized with la, 2c, or 2a, 2s isolates (data not shown), ss Sub-type 2 
probes do not cross hybridize with each other (data not shown) but did cross-hybridized 
with subtype 2c isolates; however, the distance between the homoduplex and the 2c 
isolates indicates a high degree of divergence suggesting that patients 23, 30 and 33 had 
different sub-types. The virus in sera 23, 30 and 33 was confirmed by sequencing the 
partial El to be most closely related to sub-type 2c (see figure 2b) but was ambiguous by 
5 1UTR sequencing, See Figure 2D. 

Isolates 23, 30 and 33 hybridized with the 2a probe, while only 30 and 33 
hybridized to the 2b probe. The gels also indicate that isolate 30 is more closely related 
to 2a than to 2b. Therefore, while all three sera are clearly type 2 non-a, non-b subtype, 
they are not all equally divergent from types 2a and 2b. As seen in Figures IB and ID, 
patient 4 appears to be co-infected with types 1 b and a non-a, non-b type subtype. 

The 1 b probe was derived from a patient (JK 16) and appeared to have two viral 
genomes which is reflected in the homoduplex lane (h) and therefore each lb patient has 
two bands. 
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The ss probe 3a was derived from a plasmid clone of one RT-PLR product from a 
type 3a individual ( JK3a), see Fig. IE, lane h, therefore, multiple bands in lane 22 most 
likely reflect two closely related viruses in this patient. 

It appeared that most often patients had unique viral isolates. It is possible that 
patents 3 and 1 8 had identical or highly related virus isolates. Similarly, patients 20 and 
26 had the same type 3a viral isolate and patients 2 and 4 has the same type lb isolate 
based on the co-migration of the bands on MDE gels. 

Figures 2a-2c depict phylogenetic trees, dendrograms, showing the genetic 
relatedness of each of the partial El nucleotide sequences. These denrograms were 
constructed by pairwise progressive alignment of the nucleotide sequences to one another 
by using the computer software program Gene Works Unweighted Pair Group Methods 
with Arithmetic mean, as described in Weiner, et aL, J. Virol . 67: pg. 4365-4368 (1993). 
The dendrograms, in Figures 2a-2c were formed by comparing partial El sequences of 
putative type 1 (nt 625-93), type 2 (nt 583-91 5) or type 3 (nt 558-834) isolates from the 
dialysis patients, as determined by sequence analsyis to published genotype sequences 
for type 1 a (HCV-1 ) (Choo, et al. PNAS 1 99 1 ); 1 b (HCV-J) (Kato et al).; 2a (HC- 
J6)(Okamoto, et al (1992); 2b (HC-J8)(Okamoto, et al, 1992); 2c (Bukh, et al.PNAS 
1993) and 3a (NZL-1) (Sakamoto, et al. 1994) over the same region of the genome. 

Figure 2D is a dendrogram formed as above-described by comparing either 
partial 5" UTR sequences of isolates 23, 30 and 33 with published type 1, 2 and 3 (nt-274 
to -81) genotype sequences for the same region of the genome. 

The results of the RFLP and HTA were compared and are presented in Table 2. 
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Table 2 

Comparison of Partial El HTA and RFLP Genotyping Results 



Patient HTA RFLP 



1 


lb 


lb 


2 


lb 


lb 


3 


3a 


3a 


4 


lb 


lb 


7 


3a 


3a 


18 


3a 


3a 


20 


3a 


3a 


22 


3a 


3a 


23 


2?* 


2a 


24 


lb 


lb 


26 


3a 


3a 


28 


lb 


lb 


30 


2?* 


2a 


33 


2?* 


2a 


35 


3a 


3a 



* sample is neither 2a nor 2b 



The partial El sequences depicted in Figures 3a-3d confirm the HTA sub-type 
designations given in Table 2 and definitively show that patients 23, 30 and 33 are most 
closely related to 2c with 33 being the most distantly related to 2c. (18.6% divergent). 

The RFLP results using ScrFI (see Davidson, et al., J. Gen. Virol. (1995) 
76:1 197-1204) wrongly designated 23, 30 and 33 as type 2a. This wrong designation is 
reflected in Figure 2D which shows that based on the 5* UTR nt sequence, the computer 
did not accurately sub-type HCV 2c due to insufficient nt divergence in this region of the 
genome. 

The present invention of HTA utilizing primers for the core and envelope region 
allowed for 3 levels of characterization of HCV genomes. The first was type specificity 
in the choice of RT-PCR primers. The second was sub-type specificity, based on 
choosing primers in the core/El region, and from a region greater than 400 nt, which 
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resulted in a lack of cross-hybridization between sub- type probes, e.g. 1 and 3, 2a, 2b; 
and a high deg ree of hetero genit y to maximize dif ferences between genotvpes_(lack-of_ 



cross-hybridization). Finally, isolate specificity was determined by the distance from the 
homoduplex as exemplified in Figures 1 .E - 1-E. Other genotyping methods do not have 
the ability to analyze isolate differences 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: CHIROW CORPORATION 

(ii) TITLE OF INVENTION: HETERODUPLEX TRACKING ASSAY (HTA) FOR 
GENOTYPING HCV 

(iii) NUMBER OF SEQUENCES: 52 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Chiron Corporation 

(B) STREET: 4560 Horton Street - R440 

(C) CITY: Emeryville 

(D) STATE: California 

(E) COUNTRY: USA 

(F) ZIP: 9460B-2916 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version 81.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: Unassigned 

(B) FILING DATE: Even date herewith 

(C) CLASSIFICATION: 

(viii) ATTORNEY/AGENT INFORMATION: 

(A) NAME: Harbin , Alisa A. 

(B) REGISTRATION NUMBER: 33,895 

(C) REFERENCE /DOCKET NUMBER: 1226.100 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (510) 923-3274 

(B) TELEFAX: (510) 655-3542 

(C) TELEX: N/A 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

CCTGGTTGCT CTTTCTCTAT CT 22 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2 
GATCGCTTGT GGGATCCGGA G 
(2) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3 
GATGACCTCG GGGACGCGCA T 
(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 
GACCAGTTCT GGAACACGAG C 
(2) INFORMATION FOR SEQ ID NO:5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 



CAAGGTCTGG GGTAAACGCA G 
(2) INFORMATION FOR SEQ ID NO: 6: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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<D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 
CCAGTTCATC ATCATATCCC A 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE i DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7 
TGGCCCTGCT CTCTTGCTTG AC 
(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TTGCTCTTCT GTCGTGCGTC AC 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 
TTGCTCTGTT CTCTTGCTTA AT 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 0€ base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

AACTCAAGCA TTGTGTATGA AGCGGCGGAC ATGATCATGC ACACCCCCGG GTGCGTGCCA 60 

TGCGTCCGGG AGGGCAATCT CTCCCGCTGC TGGGTAGCGC TCACTCCCAC GCTCGCGGCC 120 

AGAAACAGCA GCGTTCCTAC TACGACAATA CGACGCCATG TCGACTTGCT AGTAGGAGCG 180 

GCTG CTTTTT GCTCCGCCAT GTACGTGGGG GACCTCTGCG GATCTATTTT CCTCGTCTCC 24 0 

CAACTGTTCA CCTTCTCGCC CCGCCGGCAT CATACAGTAC AGGACTGCAA TTGCTCGATC 3 00 

TATCCC 3 06 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 06 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
AACTCAAGCA TCGTGTATGA GGCAGCGGAA GTGATCATGC ACATTCCCGG GTGCGTGCCC 60 

TGCGTTCGGG AG AG CAATCT CTCCCGCTGC TGGGTAGCGC TCACCCCCAC ACTCGCGGCC 12 0 

AGGAACAGCA GCGTCCCCAC CACGACAATA CGACGCCACG TCGACTTGCT CGTTGGGGCG 180 

GCTGCCTTCT GCTCCGCTAT GTATGTGGGG G ATCTCTG CG GATCTGTTTT CCTTGTCTCC 240 

CAACTGTTCA CCTTTTCGCC TCGCCGGCAT GAGACAGTAC AGGACTGCAA TTGTTCAATC 3 00 

TATCCC 3 06 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 306 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 12 : 

AAeXC^L^CAZTAGT ATATGA GG CAGCGGAC A TAATCATGC~ATAC C C C CGG GTG CGTGCCC 60" 

TGTGTTCGGG AGGTCAACTC CTCCCGCTGC TGGGCAGCGC TCACCCCTAC GCTCGCGGCC 120 

AGGAACTCCA GCGTGCCCAC TACGACAATA CGACGCCACG TCGACTTGCT CGTTGGGGCG 180 
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GCTG CTTT CT GCTCCGCTAT GTACGTGGGG GATCTATGCG GATCTGTTCT ACTTGTCTCT 24 0 

CAGCTGTTCA CCTTCTCACC TCGCCGGCAC GAGACAGTGC AGGACTG CAA TTGTTCAATC 300 

TATCCC 3 06 
(2) INFORMATION FOR SEO ID NO:13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 06 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 
(D> TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

AACACGAGCA TTGTGTATGA GG CAGCGG AC TTGATCATGC ACGTCCCCGG GTGCGTGCCC 60 

TGCGTTCGGG AGGGCAACTC CTCCCGATGC TGGGTAGCGC TCACTCCCAC GATCGCGGCC 120 

AGGAACAGCA GTGTCCCCGT TACGACCATA CGACGCCACG TCGATTTGCT CGTTGGGGCG 180 

GCTGCTCTTT GCTCCGCCAT GTACGTGGGG GATCTCTGCG GATCTGTCTT CCTCGCTTCC 24 0 

CAGTTGTTCA CTTTCTCGCC TCGCCAGCAT CAGACGGTAC AGGACTG CAA CTGCTCAATC 300 

TATCCC 306 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 06 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:14: 

AACTCAAGCA TCGTGTATGA GGCGGCGGAA GTGATCATGC ACATTCCTGG GTGCGTGCCC 60 

TGCGTTCGGG AGGGCGACTT CTCCCGCTGC TGGGTAGCGC TCACCCCCAC ACTCGCGGCC 12 0 

AGGAATAACA GCGTCCCCAC TACGACAATA CGACGCCACG TCGACTTGCT CGTTGGGGCG 180 

GCTGCCTTCT GCTCCGCTAT GTACGTGGGG GATCTCTGCG GATCTGTTTT CCTTGTCTCC 24 0 

CAACTGTTCA CCTTTTCGCC TCGCCGGCAT GCGACAGTAC AGGACTGCAA TTGTTCAATC 3 00 

TATCCC 3 06 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

<A> LENGTH: 3 06 base pairs 



WO 97/40190 PCT/US97/06062 

OS' 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 



<ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

AACTCGAGTA TTGTGTACGA GGCGGCCGAT GCCATCCTGC ACACTCCGGG GTGCGTCCCT 60 

TGCGTTCGTG AGGGCAACGC CTCGAGGTGT TGGGTGGCGA TGACCCCTAC GGTGGCCACC 120 

AGGGATGGCA AACTCCCCGC GACGCAGCTT CGACGTCACA TCGATCTGCT TGTCGGGAGC 180 

GCCACCCTCT GTTCGGCCCT CTACGTGGGG GACCTATGCG GGTCTGTCTT TCTTGTCGGC 240 

CAACTGTTCA CCTTCTCTCC CAGGCGCCAC TGGACGACGC AAGGTTGCAA TTGCTCTATC 3 00 

TATCCC 306 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 06 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 16: 

AACTCAAGTA TTGTGTATGA GGCAGCGGAC ATGATCATGC ACACCCCCGG GTGCGTGCCC 6 0 

TGCGTCCGGG AGAGTAATTT CTCCCGTTGC TGGGTAGCGC TCACTCCCAC GCTCGCGGCC 120 

AGGAACAGCA GCATCCCCAC CACGACAATA CGACGCCACG TCGATTTGCT CGTTGGGGCG 180 

GCTGCTCTCT GTTCCGCTAT GTACGTTGGG GATCTCTGCG GATCCGTTTT TCTCGTCTCC 24 0 

CAGCTGTTCA CCTTCTCACC TCGCCGGTAT GAGACGGTAC AAGATTGCAA TTGCTCAATC 3 00 

TATCCC 3 06 
(2) INFORMATION FOR SEQ ID NO: 17: 

( i ) S EQUENCE CHARACTER I ST I CS : 

(A) LENGTH: 3 06 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNES S : s i ng 1 e 

( D) TOPOLOGY ; linear — 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
AATGATAGCA TTACCTGGCA ACTCCAGGCT GCTGTCCTCC ACGTCCCCGG GTGCGTCCCG 



60 
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TGCGAGAAAG TGGGGAATAC ATCTCGGTGC TGGATACCGG TCTCACCGAA TGTGGCCGTG 120 

CAGCAGCCCG GCGCCCTCAC GCAGGGCTTA CGGACGCACA TTGACATGGT TGTGATGTCC 180 

GCCACGCTCT GCTCCGCTCT TTACGTGGGG G ACCTCTG CG GTGGGGTGAT GCTTGCAGCC 2 40 

CAGATGTTCA TTGTCTCGCC ACAGCACCAC TGGTTTGTGC AAGACTGCAA TTGCTCCATC 300 

TACCCT 3 06 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 306 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 

AACAACAGCA TCACCTGGCA G CTCACTG AC GCAGTTCTCC ATCTTCCTGG ATGCGTCCCA 6 0 

TGTGAGAATG AT AATGG CAC CTTGCATTGC TGGATACAAG TAACACCCAA CGTGGCTGTG 120 

AAACACCGCG GTGCGCTCAC TCGTAGCCTG CGAACACACG T CG ACATG AT CGTAATGGCA 180 

GCTACGGCCT G CTCGGCCTT G TATGTGGG A G ATGTGTG CG GGGCCGTGAT GATTCTATCG 240 

CAGGCTTTCA TGGTATCACC ACAACGCCAC AACTTCACCC AAGAGTGCAA CTGTTCCATC 300 

TACCAA 306 



(2) INFORMATION FOR SEQ ID NO: 19: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 306 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

AATAGCAGTA TTGTGTATGA GGCCGATGAT GTCATTCTGC ACACACCCGG CTGTGTACCT 60 

TGTGTCCAGG ACGGCAATAC ATCTACGTGC TGGACCCCAG ' f SACACCTAC AGTGGCAGTC 120 

AGGTACGTCG GAGCAACTAC TGCTTCGATA CG CAGTCATG TGGACCTATT AGTAGGCGCG 180 

GCCACGATGT GCTCTGCGCT CTACGTGGGT GATATGTGTG GGGCTGTCTT TCTCG TGGG A 24 0 

CAAG CCTTCA CGTTCAGACC TCGACGCCAT CAAACGGTCC AG AC C TG T AA CTGCTCGCTG 300 

TACCCA 3 06 
(2) INFORMATION FOR SEQ ID NO: 20: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 33 base pairs 
— — —(B)— TYPE-:— nucleic— acid ~~ 

(C) STRAND EDNESS : single 
<D> TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

CGCAACTCCA CGGGGCTTTA CCACGTCACC AATGATTGCC CTAACTCGAG TATTGTGTAC 6 0 

GAGACGGCCG ATG CCATCCT GCACACTCCG GGGTGCGTCC CTTGTGTTCG CGAGGGCAAC 120 

GCCTCGAGGT GTTGGGTGGC GATGACCCCT ACGGTGGCCA CCAGGGATGG CAAACTCCCC 180 

GCGACGCAGC TTCGACGTCA CATCGATCTG CTTGTCGGGA GCGCCACCCT CTGTTCGGCC 240 

CTCTACGTGG GGGATCTGTG CGGGTCTGTC TTTCTTGTCG GCCAACTGTT TACCTTCTCT 3 00 

CCCAGGCGCC ACTGGACGAC GCAAGGTTGC AAT 333 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 33 base pairs 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

AAGAACACCA GCGACAGCTA CATGGTGACC AATGACTGCC AAAATGACAG CATCACCTGG €0 

CAGCTTGAGG CTGCGGTCCT CCACGTCCCC GGGTGCGTCC CGTGCGAGAG AGTGGGAAAT 120 

ACATCTCGGT GCTGGATACC GGTCTCACCA AACGTGGCTG TGCGGCAGCC CGGCGCCCTC 180 

ACGCAGGGCT TGCGGACGCA CATCGACATG ATTGTGATGT CCGCCACGCT CTGCTCCGCT 24 0 

CTCTACGTGG GGGACCTCTG TGGCGGGATG ATGCTCGCAG CCCAGATGTT CATCGTTTCG 300 

CCGCAGAACC ACTGGTTCGT GCAGGAATGC AAT 3 33 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 
(A) T . KNGTH : 3 33 base paira ■ 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 
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AGGAACATCA GTTCTAGCTA CTACGCCACT AATGACTGCT CGAACAACAG CATCACCTGG 



60 



CAGCTCACCA ACG CAGTTCT CCACCTTCCC GGATGCGTCC CATGTGAGAA TAATAATGGC 



120 



ACCTTGCATT GCTGGATACA AGTAACACCT AATGTGGCCG TAAAACATCG CGGCGCACTC 



180 



ACTCACAACC TGCGGACACA TGTCGACATG ATCGTAATGG CAGCTACGGT CTGTTCGGCC 



240 



TTGTACGTAG GAGACGTGTG TGGGGCTGTG ATGATTGTGT CTCAGGCCCT TATAATATCA 



300 



CCAGAACACC ATAACTTCAC CCAAGAGTGC AAC 



333 



(2) INFORMATION FOR SEQ ID NO:23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 3 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 

AAGGACACCG GCGACTCCTA CATGCCGACC AACGATTGCT CCAACTCTAG TATCGTTTGG 60 

CAGCTTGAAG GAG CAGTG CT TCATACTCCT GGATGCGTCC CTTGTGAGCG TACCGCCAAC 120 

GTCTCTCGAT GTTGGGTGCC GGTTGCCCCC AATCTCG CCA TAAGTCAACC TGGCGCTCTC 180 

ACTAAGGGCC TGCGAG CACA CATCGATATC ATCGTGATGT CTGCTAOGGT CTGTTCTGCC 240 

CTTTATGTGG GGGACGTGTG TGGCGCGCTG ATGCTGG CCG CTCAGGTCGT CGTCGTGTCG 300 

CCACAACACC ATACG TTTGT CCAGGAATGC AAC 333 
(2) INFORMATION FOR SEQ ID NO:24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D> TOPOLOGY: linear 

<ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

AAAAACACCA GCATCTCCTA TATGGCGACC AACGACTGCT CCAATTCCAG CATCG CTTGG 60 

CAGTTTGACG GCGCAGTGCT CCATACTCCT GGATGTGTCC CTTGCGAACG GACCGGCAAC 120 

GCGTCCCGGT GTTGGGTGCC GGTTGCCCCC AATGTGG CTA TAAGACAACC CGGCGCCCTC 180 

ACTAAGGGCA TACGAACGCA CAT TG ATGT C ATCGTAATGT CTGCTACGCT CTGTTCTGCC 24 0 

CTTTACGTGG GGGACGTGTG TGGTGCGCTG ATGATTGCCG CTCAGGTCGT CATTGTGTCT 300 
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CCGCAGCATC ACCACTTTGT CCAGGACTGC AAT 333 
(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3 33 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

AAGAACACCA GCGACTCCTA CATGGCGACT AACGACTGCT CTAACTCCAG CATCGTTTGG 60 

CAGCTTGAGG ACGCAGTGCT CCATGTCCCT GGATGTGTCC CTTGTGAGAA GACTGGCAAT 120 

ACGTCTCGGT GCTGGGTGCC GGTTACCCCC AATGTGGCTA CAAGTCAACC CGGCGCTCTC 180 

ACCAGGGGCT TGCGGACGCA CATCGATGTC ATCGTGATGT CAGCCACGCT CTGCTCCGCT 24 0 

CTCTATGTGG GGGACGTGTG TGGCGCGTTG ACGATAGCCG CTCAGGTTGT CATCGTATCG 300 

CCACGGCACC ACCACTTTGT CCAGGACTGC AAT 3 33 



(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:26: 
AAGAACACCA GCACCTCCTA CATGGTGACT AACGATTGCT CCAACTCCAG CATCGTTTGG 6 0 

CAACTTGAAG GCGCAGTGCT CCATGTTCCT GGATGTGTCC CTTGTGAGCA GATCGGCAAC 120 

GTGTCTCAGT GTTGGGTGCC GGTTACCCCC AATATGGCCA TAAGTACACC CGGCGCTCTC 180 

ACTAAGGGCT TGCGAACGCA CATCGACGGC ATCGTGATGT CCGCTACGCT CTGTTCTGCC 24 0 

CTTTATGTGG GGGACGTGTG TGGCGCGTTG ATGATAGCCG CCCAGGTCGT CATCGTATCG 3 00 

CCACAGCACC ACCACTTTGT CCACGACTGC AAC 333 



(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
(A)— LENGTH:— 3-^3— base-pairs- 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

CGCAACTCCA CGGGGCTTTA CCACGTCACC AATGATTGCC CTAACTCGAG TATTGTGTAC 60 

GAGGCGGCCG ATGCCATCCT GCACACTCCG GGGTGCGTCC CTTGCGTTCG TGAGGGCAAC 120 

GCCTCGAGGT GTTGGGTGGC GATGACCCCT ACGGTGGCCA CCAGGGATGG CAAACTCCCC 180 

GCGACGCAGC TTCGACGTCA CATCGATCTG CTTGTCGGGA GCGCCACCCT CTGTTCGGCC 24 0 

CTCTACGTGG GGGACCTATG CGGGTCTGTC TTTCTTGTCG GCCAACTGTT CACCTTCTCT 3 00 

CCCAGGCGCC ACTGGACGAC GCAAGGTTGC AAT 333 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CGCAACGTGT C CGGGAT AT A CCATGTCACG AACGACTGCT CCAACTCAAG TATTGTGTAT 60 

GAGGCAGCGG ACATGATCAT GCACACCCCC GGGTGCGTGC CCTGCGTCCG GGAGAGTAAT 120 

TTCTCCCGTT GCTGGGTAGC GCTCACTCCC ACGCTCGCGG CCAGGAACAG CAGCATCCCC 180 

ACCACGACAA TACGACGCCA CGTCGATTTG CTCGTTGGGG CGGCTGCTCT CTGTTCCGCT 24 0 

ATGTACGTTG GGGATCTCTG CGGATCCGTT TTTCTCGTCT CCCAGCTGTT CACCTTCTCA 3 00 

CCTCGCCGGT ATGAGACGGT ACAAGATTGC AAT 3 33 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : s ing 1 e 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

AGGAACATTA GTTCTAGCTA CTACGCCACT AATGATTGCT CAAACAACAG CAT CA CCTGG 6 0 

CAGCTCACTG ACGCAGTTCT CCATCTTCCT GGATGCGTCC CATGTGAGAA TGATAATGGC 12 0 

ACCTTGCATT GCTGGATACA AGTAACACCC AACGTGGCTG TGAAACACCG CGGTGCGCTC 180 

ACTCGTAGCC TGCGAACACA CGTCGACATG ATCGTAATGG CAGCTACGGC CTGCTCGGCC 24 0 
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TTGTATGTGG GAGATGTGTG CGGGGCCGTG ATGATTCTAT CGCAGGCTTT CATGGTATCA 300 
CCACAACGCC ACAACTTCAC _CCAAGAGTGC_AAC ~ 333 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 
<C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

CGGAATACGT CTGG C CTCTA CGTCCTTACC AACGACTGTT CCAATAGCAG TATTGTGTAT 60 

GAGGCCGATG ATGTCATTCT GCACACACCC GGCTGTGTAC CTTGTGTCCA GGACGGCAAT 120 

ACATCTACGT GCTGGACCCC AGTGACACCT ACAGTGG CAG TCAGGTACGT CGGAGCAACT 180 

ACTGCTTCGA TACG CAGTCA TGTGGACCTA TTAGTAGGCG CGGCCACGAT GTGCTCTGCG 24 0 

CTCTACGTGG GTGATATGTG TGGGGCTGTC TTTCTCGTGG GACAAGCCTT CACG TTCAG A 3 00 

CCTCGACGCC ATCAAACGGT CCAGACCTGT AAC 33 3 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 333 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

AAGAACATCA GTACCGGCTA CATGGTGACC AACGACTGCA C CAATG AT AG CATTACCTGG 60 

CAACTCCAGG CTGCTGTCCT CCACGTCCCC GGGTGCG TCC CGTGCGAGAA AGTGGGGAAT 120 

ACATCTCGGT GCTGGATACC GGTCTCACCG AATGTGGCCG TGCAGCAGCC CGGCGCCCTC 1B0 

ACGCAGGGCT TACGGACGCA CATTGACATG GTTGTGATGT CCGCCACGCT CTGCTCCGCT 24 0 

CTTTACG TGG GGGACCTCTG CGGTGGGGTG ATGCTTGCAG CCCAGATGTT CATTGTCTCG 3 00 

CCACAGCACC ACTGGTTTGT GCAAGACTGC AAT 3 33 

(2) INFORMATION FOR SE Q ID NO: 32: 

(i> SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY : linear 
(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

TCATCCAACA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TTACCAACGA 60 

CTGTTCCAAT AACATTATTG TGTATGAGGC CGATGACGTC ATCCTGCACA CGCCCGGCTG 120 

TGTACCTTGT GTTCAGGACG GTAATACATC CAAGTGCTGG ACCCCAGTGA CACCTACAGT 180 

GGCAGTCAGG TACGTCGGAG CAACCACCGC TTCAATACGC AGCCACGTGG ACCTATTATT 240 

GGGCGCGGCC ACGATGTGCT CTGCGCTCTA CGTGGGT 277 
(2) INFORMATION FOR SEQ ID NO:33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:33: 

TCATCCAACA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TTACCAACGA €0 

CTGTTCCAAT AACATCATTG TGTATGAGGC CGATGACGTC ATCCTGCACG CACCCGGCTG 120 

TGTACCTTGT GTTCAGGACG GCAATACATC CACGTGCTGG ACCCCAGTGA CACCTACAGT 180 

GGCAGTCAGG TACGTCGGAG CAACCACCGC TTCAATACGC AG CCATGTGG ACCTATTAGT 24 0 

GGGCGCGGCC ACGATGTGCT CTGCGCTCTA CGTGGGT 277 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 

TCATCCAACA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TTACCAACGA 60 

CTGTTCCAAT AATATTATTG TGTATGAGGC CGACGACGTC ATCCTGCACG CCCCCGGCTG 120 

TGTACCTTGT GTTCAGGACG GCAATACATC CACGTGCTGG ATCCCAGTGA CACCTACAGT 180 

GGCAGTCAGG TACGCCGGAG CAACCACCGC TTCAATACGC AG C CATGTGG ACCTGTTAGT 24 0 
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GGG CGCGGCC ACGATGTGCT CTGCGCTCTA CGTGGGT 277 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 

TCATCCAACA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TTACCAACGA 60 

CTGTTCCAAT AACATTATTG TGTATG AGG C CGATGACGTC ATCCTGCACA CACCCGGCTG 120 

TGTACCTTGT GTTCAGGACG GCAATACATC CACGTGCTGG ACCCCAGTGA CACCTACAGT 180 

ATCAGTCAGG TACGTCGGAG CAACCACCGC TTCAATACGC AG CCATGTGG ACCTACTATT 24 0 

GGG CGCGGCC ACGATGTGCT CCGCGCTCTA CGTGGGT 277 



(2) INFORMATION FOR SEQ ID NO:36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 77 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 36: 

TCATCCAACA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TTACCAACGA 60 

CTGTTCCAAT AACAGTATTG TGTATG AGGC CGATGACGTC ATCCTGCACA CACCCGGCTG 120 

TGTACCTTGT GTTCAAGCCA ACAATAAATC CAAATG CTGG ACCCCAGTGA CACCTACAGT 180 

AT CAGTCGAG TACGTCGGAG CAACCACCGC TTCAATACGC AGCCATGTGG ACCTACTATT 24 0 

GGG CGCGGCC ACGATGTGCT CTGCGCTCTA CGTGGGT 277 



(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 
(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
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T CATC CAAGA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TTACCAACGA €0 

CTGTTCTAAT AACATTATTG TGTATGAGGC CGATGACGTC ATCCTG CACA CACCCGGCTG 120 

TGTACCTTGT GTTCAGGACG GCAATGCATC CACGTGCTGG ACCCCAGTAA CAC CTACAGT 1B0 

ATCAGTCAGG TACGTCGGAG CAACCACCGC TTCAGTACGC AGCCATGTGG ACCTACTATT 24 0 

GGGCGCGGCC ACGATGTGCT CTGCGCTCTA TGTGGGT 277 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

<A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

TCATCCAACA TCTAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTATGTCC TCACCAACGA 60 

CTGTTCCAAC AACATTATTG TGTATGAGGC CGATGACGTC ATT CTG CACA CGCCCGGCTG 120 

CGTACCTTGT GTACAGGACG GCAATACATC CACGTGCTGG ACCCCAGTGA CACCTACAGT 180 

GGCAGTCAGG TACGTCGGAG CAACTACCGC TTCAATACGC AGCCATGTGG ACCTATTATT 24 0 

GGGCGCGGCC ACGATGTGCT CTGCGCTCTA CGTGGGT 277 
(2) INFORMATION FOR SEQ ID NO:39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 77 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

TCATCCAGCA GCCAGTCTAG AGTGGCGGAA TACGTCTGGC CTCTACGTCC TTACCAACGA 60 

CTGTTCCAAT AG CAGT ATTG TGTATGAGGC CGATGATGTC ATTCTGCACA CACCCGGCTG 120 

TGTACCTTGT GTCCAGGACG GCAATACATC TACGTG CTGG ACCCCAGTGA CACCTACAGT 180 

GGCAGTCAGG TACGTCGGAG CAACTACTGC TTCGATACGC AGTCATGTGG AC CTATT AGT 24 0 

AGGCGCGGCC ACGATGTGCT CTGCGCTCTA CGTGGGT 27 7 
(2) INFORMATION FOR SEQ ID NO:40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

( i i-)— MOLEGUtE— TYPE :~DN A— (genomic )- ~~~ 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

TGTGCCCGCT TCGGCCTACC AAGTGCGCAA CTCCACGGGG CTTTACCACG TCACCAATGA 60 

TTGCCCTAAC TCGAGTATTG TGTACGAGGC GGCCGATGCC ATCCTGCACA CTCCGGGGTG 120 

CGTCCCTTGC GTTCGTGAGG GCAACGCCTC GAGGTGTTGG GTGGCGATGA CCCCTACGGT 1B0 

GGCCACCAGG GATGGCAAAC TCCCCGCGAC GCAGCTTCGA CGTCACATCG ATCTG CTTGT 240 

CGGGAGCGCC ACCCTCTGTT CGGCCCTCTA CGTGGGG 277 
(2) INFORMATION FOR SEQ ID NO:4 1: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 77 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 1 : 

CATCCCAGCT TCCG CTTACG AGGTGCGCAA CGTGTCCGGG ATATACCATG TCACGAACGA 6 0 

CTGCTCCAAC TCAAGTATTG TGTATGAGGC AG CGG ACATG ATCATGCACA CCCCCGGGTG 120 

CGTGCCCTGC GTCCGGGAGA GTAATTTCTC CCGTTGCTGG GTAGCGCTCA CTCCCACGCT 180 

CGCGGCCAGG AACAGCAGCA TCCCCACCAC GACAATACGA CGCCACGTCG ATTTGCTCGT 24 0 

TGGGGCGGCT GCTCTCTGTT CCG CTATGTA CGTTGGG 277 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
..(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

( xi ) SEQUENCE DESCRIPTION: SEQ ID NO:42: 

CACCCCGGTC TCCGCTGCCG AAGTGAAGAA CAT CAG T AC C GGCTACATGG TG AC CAACG A 6 0 

-CTGCAeCAAT GATAGCATTA C CTG G C AA CT CCAG GCTGCT GTCCTCCACG~~TCCCCGGGTG 120" 
CGTCCCGTGC GAGAAAGTGG GGAATACATC TCGGTGCTGG ATACCGGTCT CACCGAATGT 180 
GG C CG TG CAG CAGCCCGGCG CCCTCACGCA GGGCTTACGG ACG CACATTG ACATGGTTGT 24 0 
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GATGTCCGCC ACGCTCTGCT CCGCTCTTTA CGTGGGG 
(2) INFORMATION FOR SEQ ID NO:43: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 277 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:43: 

AGTGCCAGTG TCTGCAGTGG AAGTCAGGAA CATTAGTTCT AG CT ACT ACG CCACTAATGA 60 

TTGCTCAAAC AACAGCATCA CCTGGCAGCT CACTGACGCA GTTCTCCATC TTCCTGGATG 120 

CGTCCCATGT GAGAATGATA ATGGCACCTT GCATTGCTGG ATACAAGTAA CACCCAACGT 180 

GGCTGTGAAA CACCGCGGTG CGCTCACTCG TAGCCTGCGA ACACACGTCG ACATGATCGT 24 0 

AATGG CAGCT ACGGCCTGCT CGGCCTTGTA TGTGGGA 277 
(2) INFORMATION FOR SEQ ID NO:44: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:44: 
GCAGAAAGCG TCTAG CCATG GCGTTAGTAT GAGTGTCGTA CAGCCTCCAG GCCCCCCCCT 60 
CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CGGGAAGACT 120 
GGGTCCTTTC TTGGATAAAC CCACTCTATG CCCGGTCATT TGGGCGTGCC CCCGCAAGAC 180 
TGCTAGCCGA GTAG 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS: %tmmm 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 



194 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 5 : 
GCAGAAAGCG T CT AGCCATG GCGTTAGTAT GAGTGTCGTA CAGCCTCCAG GCCCCCCCCT 



60 



WO 97/40190 



PCT/US97/06O62 



3/ 



CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTAC CGGAAAGACT 



120 



GGGTCCTTTC TTGGATAAAC CCACT CTATG TCCCXSTC^TT--TGGGCACGCC-.CCCGCAAGAG~ 



180 



TGCTAGCCGA GTAG 



194 



(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: DNA (genomic) 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

GCAGAAAGCG TCTAGCCATG GCGTTAGTAT GAGTGTCGTG CAGCCTCCAG GACCCCCCCT €0 

CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CAGGACGACC 120 

GGGTCCTTTC TTGGATCAAC CCG CTCAATG CCTGGAGATT TGGGCGTGCC CCCGCGAGAC 180 

TGCTAGCCGA GTAG 194 

(2) INFORMATION FOR SEQ ID NO:47: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii> MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:47: 

GCAGAAAGCG TCTAGCCATG GCGTTAGTAT GAGTGTCGTG CAGCCTCCAG GACCCCCCCT 60 

CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CAGGACGACC 120 

GGGTCCTTTC TTGGATCAAC CCGCTCAATG CCTGGAGATT TGGGCGTGCC CCCGCAAGAC 180 

TGCTAGCCGA GTAG 194 

(2) INFORMATION* FOR SEQ ID NO : 4 8 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 194 base pairs 

" (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 8 : 
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GCGGAAAGCG CCTAGCCATG GCGTTAGTAC GAGTGTCGTG CAGCCTCCAG GACCCCCCCT 



60 



CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATCGC TGGGGTGACC 



120 



GGGTCCTTTC TTGGAGCAAC CCGCTCAATA CCCAGAAATT TGGGCGTGCC CCCGCGAGAT 



180 



CACTAGCCGA GTAG 



194 



(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

GCAGAAAGCG TCTAG CCATG GCGTTAGTAT GAGTGTCGTA CAGCCTCCAG GCCCCCCCCT 6 0 

CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CGGGAAGACT 120 

GGGTCCTTTC TTGGATAAAC CCACTCTATG CCCGGCCATT TGGGCGTGCC CCCGCAAGAC 180 

TGCTAGCCGA GTAG 194 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 194 base pairs 

(B ) TYPE: nucleic acid 
<C> STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
GCAGAAAGCG TCTAG CCATG GCGTTAGTAT GAGTGTCGTA CAGCCTCCAG GCCCCCCCCT 60 
CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CGGGAAGACT 120 
GGGTCCTTTC TTGGATAAAC CCACTCTATG CCCGGCCATT TGGGCGTGCC CCCGCAAGAC 180 
TGCTAGCCGA GTAG 194 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:51: 
GCAGAAAGCG TCTAG CCATG GCGTTAGTAT GAGTGTCGTA CAGCCTCCAG GCCCCCCCCT 60 



CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CAGGAAGACT 120 

GGGTCCTTTC TTGGATAAAC CCACTCTATG CCTGG CCATT TGGGCGTGCC CCCGCAAGAC 180 

TGCTAGCCGA GTAG 194 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS : 

<A> LENGTH: 194 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

GCAGAAAGCG TCTAG CCATG GCGTTAGTAT GAGTGTCGTA CAGCCTCCAG GTCCCCCCCT 60 

CCCGGGAGAG CCATAGTGGT CTGCGGAACC GGTGAGTACA CCGGAATTGC CGGGAAGACT 120 

GGGTCCTTTC TTGGATAAAC CCACTCTATG CCCGGCCATT TGGGCGTGCC CCCGCAAGAC 180 

TGCTAGCCGA GTAG 194 
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40 

What is claimed is: 

1 . An oligonucleotide consisting of the sequence of Seq ID No. 1 . 

2. An oligonucleotide consisting of the sequence of Seq ID No. 2. 

3. An oligonucleotide consisting of the sequence of Seq ID No. 3. 

4. An oligonucleotide consisting of the sequence of Seq ID No. 4. 

5. An oligonucleotide consisting of the sequence of Seq ID No. 5. 

6. An oligonucleotide consisting of the sequence of Seq ID No. 6. 

7. An oligonucleotide consisting of the sequence of Setj ID No. 7. 

8. An oligonucleotide consisting of the sequence of Seq ID No. 8. 

9. An oligonucleotide consisting of the sequence of Seq ID No. 9. 

10. A pair of PCR primers wherein the sense primer consists of Seq ID NO. 1 
and the antisense primer is selected from the group consisting of Seq ID NO 2, Seq ID 
NO. 3, Seq ID NO. 4 and Seq ID NO. 5. 

11. A pair of PCR primers wherein the antisense primer consists of Seq ID 
NO. 6 and the sense primer is selected from the group consisting of Seq ID NO 7, Seq ID 
NO. 8, and Seq ID NO. 9. 

12. A method of determining the HCV genotype of an HCV strain, said 
method comprising the steps of: 

(a) subjecting said HCV strain to one or more stages of PCR, wherein 
the one or more stages of PCR utilizes a sense probe from the core or El region of the 
HCV genome and an antisense probe from the core or El region of the HCV genome; 

(b) forming a heteroduplex by denaturing and reannealing mixtures of 
the amplified product obtained in step (a) with DNA or RNA fragments of a known HCV 
genotype; 
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v/ 

(c) comparing the mobility of said heteroduplex on a system that 
separates by size with the mobility of a homoduplex of the DNA or RNA fragments of 
known genotype to determine the genotype of the HCV strain. 

13. The method of claim 12 wherein said HCV strain is subjected to two 
stages of PCR, wherein the first set of primers comprise a universal sense probe from the 
core or El regions of the HCV genome and a type specific antisense probe from the core 
or El regions of the HCV genome, and wherein the second set of PCR primers comprise 
a universal antisense probe from the core or El regions of the HCV genome and a type 
specific sense probe from the core or El regions of the HCV genome. 

14. The method of claim 12 wherein the first set of PCR primers are those 
according to claim 10 and wherein the second set of PCR primers are those according to 
claim 11. 

15. The method of claim 12 wherein said DNA or RNA fragments of a known 
genotype comprise a DNA probe. 

16. The method of claim 1 5 wherein said probe is single stranded. 

17. The method of claim 1 6 wherein said DNA probe is radiolabeled. 

1 8. The method of claim 1 6 wherein said single standed DNA probe is 
obtained by PCR amplification. 

19. The method of claim 18 wherein said DNA probe is obtained by two step 
PCR ampl if ic ati on utilizing the primers of claim 10 for the first step and claim 1 1 for the - 
second step. 
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20. The method of claim 12 wherein said HCV strain is present in an excess 
in the mixture forming the heteroduplex. 

21 . A method to predict the response to drug therapy of a strain of HCV from 
a patient infected with said strain of HCV, said method comprising determining the 
sensitivity of known HCV genotypes to said drug therapy, determining the HCV 
genotype of said strain of HCV by the method according to claim 12, and comparing said 
HCV genotype of said strain prior to said drug therapy with said sensitivity of known 
HCV genotypes to said drug therapy. 

22. A method to predict the response to a therapeutic vaccine of a strain of 
HCV from a patient infected with said strain of HCV, said method comprising 
determining the sensitivity of known HCV genotypes to said therapeutic vaccine, 
determining the HCV genotype of said strain of HCV by the method according to claim 
12, and comparing said HCV genotype of said strain prior to administration of said 
therapeutic vaccine with said sensitivity of known HCV genotypes to said therapeutic 
vaccine. 

23 . A method to predict the appropriateness of a prophy latic vaccine 
composition for a given sample population said method comprising determining the 
genotype of said prophylatic vaccine, determining the predominance of known HCV 
genotypes in said sample population by the method according to claim 12, and 
comparing said HCV genotype of said prophylatic vaccine strain to the determined 
predominant genotype prior to administration of said prophylatic vaccine to said 
population sample. 
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HTA1 AACTCAAGCATTGTGTATGAAGCGGCGGACATGATCATGCACACCCCCGGGTGCGTGCCA 

HTA2 AACTCAAGCATCGTGTATGAGGCAGCGGAAGTGATCATGCACATTCCCGGGTGCGTGCCC 

HTA4 AACTCAAGCATAGTATATGAGGCAGCGGACATAATCATGCATACCCCCGGGTGCGTGCCC 

HTA2 4 AACACG AG CATTGTGT ATG AGG C AGCGG ACTTG ATCATGC ACGTCCCCX3GGTGCGTG CCC 

HTA2 8 AACTCAAGCATCGTGTATGAGGCGGCGGAAGTGATCATGCACATTCCTGGGTGCGTGCCC 

HCV-1 AACTCGAGTATTGTGTACGAGGCGGCCGATGCCATCCTGCACACTCCGGGGTGCX5TCCCT 

HCV - J AACTCAAGTATTGTGTATGAGGCAGCGGACATGATCATGCACACCCCCGGGTGCGTGCCC 

HC - J6 AATG ATAGCATTACCTGGCAACTCCAGGCTGCTGTCCTCCACGTCCCCGGGTGCGTCCCG 

HC - J8 AACAACAGCATCACCTGGCAGCTC^CTGACGCAGTTCTCCATCriTCCTGGATGCGTCCCA 

NZL - 1 AATAGC AGTATTG TGTATG AGGCCG ATG ATGTCATTCTG GACACACCCGGCTGTGT ACCT 



HTA1 TG CGTCCGGG AGGG CAATCTCTCCCG CTG CTGGGTAGCGCTCACTCCCACGCTCGCGG CC 

HTA2 TGCGTTCGGGAGAGCAATCTCTCCCXSCroCTGGGTAGOGCTCACCCCCACACT 

HTA4 TGTGTTCGGGAGGTCAACTCCTCCCGCrrcCTGGGCAGCGCTCACC 

HTA2 4 TGCGTTCXSGGAGGGCAACTCCTCCCGATGCTGGGTAGCGCTCACTCCCAC^ 

HTA2 8 TG CG TTCGGG AGGG CG ACTTCTCC CG CTG CTGGGTAG CG CTC ACCCC CACACTCG CGG CC 

HCV - 1 TGCGTTCGTGAGGGC7\ACGCCTCGAGGTGTTGGGTGK5CX3ATGACCCCTACGGTGGCCACC 

HCV - J TGCGTCCGGG AGAGTAATTTCrCCCGTTGCTGGGTAGCGCTCACT 

HC- J6 TGCGAGAAAGTGGGGAATACATCTCGGTGCTGGATACCGGTCTCACCGAATGTGGCCGTG 

HC-J8 TGTGAGAATGATAATGGCACCTTGCATTGCTGGATACAAGTAACAC^ 

NZL - 1 TG TGTCCAGG ACGGCAATACATCTACGTGCTGG ACCCCAGTG ACACCTACAGTGG CAGTC 

JKla TGTGTTCGCGAGGGCAACGCCTCGAGGTGTTGGGTGGCGATGACCC^ 



HTA1 AG AAACAGC AG CG TTCCT ACT ACG ACAATACG ACGCCATGTCG ACTTGCTAGTAGGAG CG 

HTA2 AGG AACAGC AG CGTCCCCACC ACG ACAATACG ACGCCACGTCGACTTGCTCGTTGGGGCG 

HTA4 AGGAACTCCAG CGTG CCC ACT ACG ACAATACGACGCCACGTCG ACTTG CTCGTTGGGGCG 

HTA2 4 AGGAACAGCAGTGTCCCCGTTACGACCATACGACGCCACGTCGATTTGCTCGTTGGGGCG 

HTA2 8 AGG AATAAC AG CGTCCCC A CTACGAC7UVTACG ACGCCACGTCGACTTGCTCGTTGGGGCG 

HCV - 1 AGGGATGGCAAACTCCCCGCGACGC^GCTTCGACGTCACATCGATCrGCTTGTCGGGAGC 

HCV- J AGGAACAGCAGCATCCCCACCACG ACAATACG ACX3CC^CGTCX3ATTTGCTCGTTGGGGCG 

HC - J 6 C AG C AG CCCGG CGCCCTC ACG CAGGG CTTACGG ACGCACATTG ACATGGTTGTG ATGTCC 

HC-J8 AAAC ACCGCGGTGCGCTCACTCGTAG CCTG CGAACACACGTCG ACATG ATCGTAATGGCA 

NZL- 1 AGGTACGTCGG AGCAACTACTG CTTCG ATACG CAGTCATGTGG ACCTATT AGTAGGCG CG 



HTA1 GCTGCTTTTTGCTCCGCCATGTACGTGGGGGACCT^ 

HTA2 GCTGCCTTCTGCTCCGCTATGTATGTGGGGGATCTCIX3CGGATCTGTTTT^ 

HTA4 G CTG CTTTCTGCTCCG CTATGT ACGTGGGGG ATCTATGCGG ATCTGTTCTACTTGTCTCT 

HTA24 GCTGCTCTTTGCTCCGCCATGTACGTGGGGGATCTCTGCGGATCTGTCTTCCTCGCTTCC 

HTA28 GCTGCCrTCTGCTCCGCTATGTACGTGGGGGATCTCTGCGGATCTGTTTTCCTTGTCTCC 

HCV - 1 GCCACCCTCTGTTCGGCCCTCTACGTGGGGGACCTATGCGGGTCTGTCTTTCTTGTCGGC 

HCV - J G CTG CTCTCTGTTCCG CTATGT ACGTTGGGGATCTCTGCGGATCCGTTTTTCTCGTCTCC 

HC - J 6 GCCACGCTCTGCTCCGCTCTTTACGTGGGGGACCTCTGCGGTGGGGTGATGCTTGCAGCC 

HC- J8 GCTACGGCCTTGCTCGGCCTTGTATGTGGGAGATGTGTGCGGGGCCGTGATGATTCTATCG 

NZL - 1 GCCACGATGTGCTCTGCGCTCTACGTGGGTGATATGTGTGGGGCTGTCTTTCTCGTGGGA 



HTA1 CAACTGTTCACCTTCTCGCCCCGCCGGCATCATACAGTACAGGACTGCAATTGCTCGATC 

HTA2 CAACTGTTCACCTTTTCGCCTCGCCGGCATGAGACAGTACAGGACTGCAATTGTTCAATC 

HTA4 CAGCTGTTCACCTTCTCACCTCGCCGGCACGAGACAGTGCAGGACTGCAATTGTTCAATC 

HTA24 CAGTTGTTCACTTTCTCGCCTCGCCAGCATCAGACGGTACAGGACTGCAACTGCTCAATC 

HTA28 CAACTGTTCACCTTTTCGCCTCGCCGGCATGCGACAGTACAGGACTGCAATTGTTCAATC 
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HCV- 1 CAACTCTTCACCTTCTCTCCCA^ 

HCV- J CAGCTGTTCACCTTCTCACCTCGCCGGTATGAGACGGTACAAGATTGCAAT^ 

HC - J6 CAGATGTTCATTGTCTCGCCACAGCACCACTGGTTTGTGCAAGACT 

HC-«J8 CAGG CTTTCATGGTATCACCACAACG CCACAACTTCACCCAAG AGTGCAACTGTTCCATC 

NZL- 1 CAAGCCTTCACGTTCAGACCTCGACGCCATCAAACXK3TCCAGA 
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JKla CGCAACTCCACGGGGCTTTACCACGTCACCAATGATTGCCCTAACTCGAGTATTGTGTAC 

JK2 a AAG AACACCAGCG ACAGCTACATGGTGACCAATG ACTGCCAAAATG ACAGCATCACCTGG 

JK2 b AGG AACATCAGTTCTAGCT ACTACG CC ACTAATG ACTG CTCG AACAACAG CATCACCTGG 

S W8 3 . 2 c AAGGACACCGGCGACTCCTACATGCCGACCAACGATTGCTCCAACTCnrAGTATCGTTTGG 

HTA2 3 AAAAACACCAGCATCTCCTATATGGCGACCAACGACTGCTCCAATTCCAGCATCGCTTG^ 

HTA3 3 AAGAACACCAGCGACTCCTACATGGCGACTAACGACTC 

HTA3 0 AAGAACACCAGCACCTCCTACATGGTGACTAACGATTGCTCCAACTCCAGCATCGTTT^ 

HCV - 1 CGCAACTCCACGGGGCTTTACCACGTCACCAATG 

HCV- J CGCAACGTGTCCGGGATATACCATGTCACGAACGACTGCTCCAACTCAAGTATTGTGTAT 

HC - J8 AGG AACATT AG TTCTAG CTACTACG CC ACT AATG ATTG CTCAAACAACAG CATCACCTGG 

NZL - 1 CGGAATACX5TCTGGCCTCTACGTCCTTACCAACGACTGT^ 

HC - J6 AAG AACATCAGTACCGGCTACATGGTGACCAACGACTGCACCAATG ATAGCATTACCTGG 

* ** ** ** ** ** ** ** *+ * 

JKla GAGACXX3CCX3ATGCGATCCIX3CACACTCCGGGG 

JK2 a CAGCCTGAGGCrTGCGGTCCTCGACXaTCCC^ 

JK2b CAGCTCACCAACGCAGTTCTCCACCTTCCCGGATGCGTCCCATGTGAGAATAATAATGGC 

SW8 3.2c CAG CTTG AAGG AG CAGTGCTTCATACTCCTGG ATGCGTCCCTTGTG AGCX5T ACCX5CCAAC 

HTA2 3 CAGTTTG ACGGCGCAGTG CTCCATACTCCTGG ATGTGTCCCTTG CG AACGG ACCGGCAAC 

HTA3 3 CAGCTTGAGGACGCAGTGCTCCATGTCCCTGGATGTGTCCCTTGTGAGAAGACTGGCAAT 

HTA3 0 CAACTTGAAGGCGCAGTGCTCCATGTTCCTGGATGTGTCCCTTGTGAGCAGATCGGCAAC 

HCV - 1 GAGGCGGCCGATGCCATCCTGCA(^CTCCX3GGGTGCGTCCCTTGCGTTCGTGAGGGCAAC 

HCV - J G AGGCAG CGG ACATG ATCATGCACACCCCCGGGTG CGTG CCCTGCGTCCGGG AG AGTAAT 

HC-J8 CAG CTC ACTG ACG CAGTTCTCCATCnTCCTGG ATGCGTCCCATGTG AGAATGAT AATGGC 

NZL- 1 G AGG CCG ATGATGTCATTCT , GCACACACCCGGCTGTGTACCTTGTGTCCAGG ACGG CAAT 

HC - J6 CAACTCCAGGCTGCTGTCCTCCACGTCCCCGGGTGCGTCCCGTGCGAGAAAGTGGGG AAT 
* * * ★* ** ** ** ** ** ** * 

JKla G CCTCG AGGTGTTGGGTGGCGATGACCCCTACGGTGGCCACCAGGGATGGCAAACTCCCC 

JK2a ACATCTCGGTGCTGGATACCGGTCTCACCAAACGTGGCTGTGCGGCAGCCCGGCGCCCTC 

JK2b ACCTTGCATTGCTGGATACAAGTAACACCTAATGTGGCCGTAAAACATCGCGGCGCACTC 

SW8 3.2c GTCTCTCGATGTTGGGTGCCGGTTGCCCCCAATCTCGCCATAAGTCAACCTGGCGCTCTC 

HTA23 GCGTCCCGGTGTTGGGTGCCGGTTGCCCCCAATGTGGCTATAAGACAACCCGGCGCCCTC 

HTA33 ACGTCTCGGTGCTGGGTGCCGGTTACCCCCAATGTGGCTACAAGTCAACCCGGCGCTCTC 

HTA3 0 GTGTCTCAGTGTTGGGTGCCGGTTACCCCCAATATGGCCATAAGTACACCCGGCGCTCT 

HCV - 1 GCCTCGAGGTGTTGGGTGGCGATGACCCCTACGGTGGCCACCAGGGATGGCAAACTCCCC 

HCV - J TTCTCCCGTTGCTGGGTAGCGCTGACTCCCACGCTCGC^ 

HC- J8 ACCTTGCATTG CTGG AT ACAAGT AAC ACCCAACGTGG CTGTG AAACACCG CGGTG CG CTC 

NZL - 1 ACATCTACGTGCTGG ACCCCAG TG AC ACCTACAGTGG C AGTCAGGT ACGTCGG AG CAACT 

HC - J6 ACATCTCGGTGCTGGATACCGGTCTCACCGAATGTGGCCGTGCAGCAGCCCGGCGCCCTC 
* * * * * * * * ** * * ** 



JKla GCGACGCT^GCTTCGACGTCACATCGATCTGCTTGTCGGGAGCGCCACCCTCTGTTCGGCC 

JK2a ACGCAGGGCTTGCGGACGCACATCGACATGATTGTGATGTCCGCCACGCTCTGCTCCGCT 

JK2 b ACTCACAACCTGCGGAGACATGTCGAGATGATCGTAATGGCAGCTACGGTCT 

SW83 . 2c ACTAAGGGCCTGCGAGCACACATCGATATCATCGTGATGTCTGCTACGGTCTGTTCTGCC 

HTA23 ACTAAGGGCATACX3AACGCACATTGATGTCATCGTAATGTCTGCTACGCTCTGTTCTGCC 

HTA3 3 ACCAGGGGCTTGCGGACGCACATCGATGTCATCGTGATGTCAGCCACGCTCTGCTCCGCT 

HTA3 0 ACT AAGGG CTTG CGAACGCACATCGACGGCATCGTGATGTCCGCTACGCTCTGTTCTGCC 

HCV - 1 GCGACGCAGCTTCGACGTCACATCGATCTG CTTG TCGGG AG CGCCACCCTCTGTT CGG CC 

HCV - J ACCACGACAATACGACGCCACGTCGATTTGCTCGTTGGGGCGGCTGCTCTCTGTTCCGCT 

HC-J8 ACTCGTAGCCTGCGAACACACGTCGACATGATCGTAATGGCAGCTACGGCCTGCTCGGCC 

NZL- 1 ACTGCTTCGATACGCAGTCATGTGGACCTATTAGTAGGCGCGGCCACGATGTGCTCTGCG 

HC- J6 ACGCAGGGCTTACGGACGCACATTGACATGGTTGTGATGTCCGCCACGCTCTGCTCCGCT 



* + * * * * 



JKla 



CTCTACGTGGGGG ATCTGTG CGGG TCTGTCTTTCTTGTCGGCC AACTGTTTACCTTCTCT 
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JK2a CTCTACGTGGGGGACCTCrTGTGGCGGGATGATGCTCGCAGCCC^GATGTTCATCGTTTCG 

JK2 b TTGT ACGTAGGAG ACGTGTGTGGGGCTGTG ATGATTGTGTCTCAGGCCCTTATAATATCA 

SW8 3.2c CTTTATGTGGGGGACGTGTGTGGCGCGCTGATGCTGGCCGCTCAGGTCGTCGTCGTGTCG 

HTA23 CTTTACGTGGGGGACGTGTGTGGTGCGCTGATGATTGCCGCTCAGGTCGTCATTGTGTCT 

HTA3 3 CTCTATGTGGGGGACGTGTGTGGCGCGTTGACGATAGCCX5CTCAGGTTGTCATCGTATCG 

HTA3 0 CTTTATGTGGGGGACX3TGTGTGGCGCGTTGATGATAGCCGCCCAGGTCGTCATCGTATCG 

HCV - 1 CTCTACGTGGGGG ACCTATGCGGGTCTGTCTTTCri^ 

HCV - J ATGTACGTTGGGGATCTCTGCGGATCCGTT^ 

HC-J8 TTGTATGTGGGAG ATGTGTG CGGGGCCX5TGATGATTCTATCG CAGG CTTTCATGGTATCA 

NZL - 1 CTCTACGTGGGTGATATGTGTGGGGCTGTCTTTC 

HC - J6 CITTACGTGGGGGACCTCTGCGGTGGGGTGATGCTTGCAGCCCAG ATGTTCATTGTCTCG 
* ** ** ** ** # ** ** * * ** * * 

JK1 a CCCAGGCGCCACTGGACGACGCAAGGTTGCAAT 

JK2a C CG CAG AACCACTGGTTCGTG CAGGAATG CAAT 

JK2b CCAG AAC ACCAT AACTTCACCCAAG AGTG CAAC 

SW83 . 2 c CCACAACACCATACGTTTGTCCAGGAATGCAAC 

HTA23 CCG CAG CATCACCACTTTGTCCAGGACTG CAAT 

HTA3 3 CCACGGCACCACCACTTTGTCCAGGACTGCAAT 

HTA3 0 CCACAG CACCACCACTTTGTCCACG ACTG CAAC 

HCV - 1 CCCAGGCGCCACTGGACGACGCAAGGTTGCAAT 

HCV - J CCTCG CCGGTATG AGACGGTACAAG ATTGCAAT 

HC-J8 CCA CAACG CC AC AA CTTCACCCAAG AG TG CAAC 

NZL - 1 CCTCGACGCCATCAAACGGTCCAGACCTGTAAC 

HC - J 6 CCACAG CACCACTGG TTTGTG CAAG ACTGCAAT 
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HTA3 
HTA7 



TCATCCAACATCTAGTCTAGAGTGGCXMAATACGTCTGGCCTCTATGTCCTTACCAACGA 
TCATCCAACATCTAGTCTAGAGTGGCGGAATACGTCTGGCCTCTATGTCCTTACCAACGA 

HTA1 8 TCATCCAACATCTAGTCTAGAGTGGCGGAATACGTCrrGGCCTCTATGTCCTTACCAAC^A 

HTA2 0 TCATCCAACATCTAGTCTAGAGTGGCX5GAATA(X5TCTGGCCTCTATGTCCT^ 

HTA22 TCATCCAACATCTAGTCTAGAGTGGCGGAATACGTCrrGGC 

HTA2 6 TCATCGAACATCTAGTCTAGAGTGGCXX3AATACGTCrrGGCCT 

HTA3 5 TCATCCAACATCTAGTCTAGAGTGGCGGAATACXSTCTGGCCTCrrATGTCCT 

NZL- 1 TCATCCAGCAGCCAGTCTAGAGTGGCGGAATACX3TCTGGCCTCT 

HCV - 1 TGTGCCCGCTI^CGGCCTACCAAGTGCGCAACTCCACGGGGCTTTACC^ 

HCV-J CATCCCAGCTTCCGCTTACGAGGTGCGCAACGTGTCCG 

HC- J 6 C AC CCCGGTCTCCG CTGCCGAAGTGAAG AACATCAGTACCGGCTAC ATGGTGACCAACGA 

HC-J8 AGTGCCAGTGTCTGCAGTGG AAGTCAGGAACATTAGTTCTAGCTACTACGCCACTAATGA 

** * * ** ** ** + * ** 

HTA3 CTG TTCCAATAACATT ATTGTGT ATG AGG CCG ATGACGTCATCCTG CACACG CCCGG CTG 

HTA7 CTGTTCCAATAACATCATTGTGTATGAGGCCG ATGACGTCATCCTG CACG CACCCGG CTG 

HTA1 8 CTG TTCCAATAAT ATTATTGTGTATG AGGCCG ACXj ACXTTCATCCTC CACG CCCCCGG CTG 

HTA2 0 CTCTTCCAATAACATTATTGTGTATGAGGCCGATGACX3TCATCCT 

HTA2 2 CTGTTCGAATAACAGTATTGTGTATGAGGCXX^ATC^ 

HTA2 6 CTGTTCTAATAACATTATTGTGTATGAGGCCGATGACGT^ 

HTA3 5 CTGTTCCAACAACATTATTGTGTATGAGGCCGATGAC^TCATT 

NZL- 1 CTGTTCCAATAGCAGTATTGTGTATGAGGCCGATGATGTCATTCTGC^ 

HCV - 1 TTGCCCTAACTCG AGTATTGTGTACGAGG CGGCCG ATG CCATCCTGCACACTCCGGGGTG 

HCV-J CTGCTCCAACTCAAGTATTGTGTATGAGGCAGCGGACATGATCATGCACACCCCCGGGTG 

HC-J6 CTGC^CCAATG ATAGCATTACCTGGCAACTCCAGG 

HC-J8 TTGCTCAAACAACAGCATCACCnXXSCAGCTCACTGACGC^ 

***** * ** * * * * ** ** ** ** 

HTA3 TGTACCTTGTGTTCAGGACGGTAATACATCCAAGTGCTGGACCCCAGTGACACCTACAGT 

HTA7 TGTACCTTGTGTTCAGGACGGCAATACATCCACGTGCTGGACCCCAGTGACACCTACAGT 

HTA1 8 TGT ACCTTGTGTTCAGG ACGGCAATACATCCACGTG CTGG ATCCCAGTGACACCTACAGT 

HTA2 0 TGTACCTTGTGTTCAGGACGGCAATA(^TCCACGTGCTGGACCCCAGTG 

HTA2 2 TGTACCTTGTGTTCAAGCCAACAATAAATCCAAATGCTGGACCCCAGTGACACCTACAGT 

HTA26 TGTACCTTGTGTTCAGGACGGCAATGCATCCACGTGCrrGGACCCCAGTAACACCTACAGT 

HTA3 5 CGTACCTTGTGTACAGGACGGCAATACATCCACGTGCTGGACCCC 

NZL- 1 TGTACCTTGTGTCCAGGACGGCAATACATCTACGTG CTGG ACCCCAG 

HCV - 1 CGTCCCTTGCGTTOSTGAGGGCAACGCCTCGAGGTC 

HCV-J CGTGCCCTGCGTCCGGGAGAGTAATTTCTCCCGTTGCTGGG 

HC- J€ CGTCCCGTG CGAG AAAGTGGGGAATACATCTCGGTGCTGGATACCGGTCTCACCGAATG T 

HC - J8 CG TCC CATGTG AGAATG ATAATGG CACCTTGCATTG CTGG ATAC AAG TAACACCCAACG T 
** ** ** * * * ** *** * * ** * * 

HTA3 GGGAGTCAGGTACGTCGGAGCAACCACCGCTTGAATACG 

HTA7 GGCAGTCAGGTACGTCGGAGCAACCAGCGCTTCAATACGCAGCCATGTGG^ 

HTA1 8 GGCAGTCAGGTACGCCGGAGCAACCACCGCTTCAATACGCAGCCATGTGG^ 

HTA2 0 ATCAGTCAGGTACGTCGGAGCAACCACCGCTTCAATAC^CAGCCATGTGGACCTACTATT 

HTA2 2 ATCAGTCGAGTACGTCGG AG CAACCACCG CTTCAATACGCAG CC ATGTGGACCTACTATT 

HTA26 ATCAGTCAGGTACGTCGGAGCAACCACCGCTTCAGTACGCAGCCATGTGGACCTACTATT 

HTA3 5 GGCAGTCAGGTACGTCGGAGCAACTACCGCT^CAATACGCAGCCATGTGGACCTATTATT 

NZL - 1 GGCAGTCAGGTACGTCGGAGCAACTACTGCTTCGATACGCAGTCATGTGGACCTATTAGT 

HCV - 1 GGCCACCAGGGATGGCAAACTCCCCGCGACGCAGCTTCGACGTCACATCGATCTGCTTGT 

HCV - J CGCGGCCAGGAACAGCAGCATCCCCACCACGACAATACGACGCCACGTCGATTTGCTCGT 

HC-J6 GGCCGTGCAGCAGCCCGGCGCCCTCACGCAGGGCTTACGGACGCACATTGACATGGTTGT 
GGCTGTGAAACACCGCGGTGCGCTCACTCGTAGCCTGCG AACACACGTCGACATGATCGT 
* * * * * * * ******** 



HC-J8 



HTA3 GGGCGCGGCCACGATGTGCTCTGCGCTCTACGTGGGT 
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GGGCGCGGCCACGATGTGCTCTGCGCTCTACGTGGGT 
GGGCG OGGCCACGATGTGCTCTGCX5 CTCTACGTGGGT 
GGGCGCGGCCACG ATGTGCTCCGCG CTCTACGTGGGT 
GGGCX3CGGCCACGATGTGCTCTGCGCTCTACGTGGGT 
GGGra<^GCCACGATGTGCTCTGCGCTCTATGTGGGT 
GGGCGCGGCCACGATGTGCTCTGCGCTCTACGTGGGT 
AGGCGCGGCCACGATGTGCTCTGCG CTCTACGTGGGT 
CGGG AGCGCCACCCTCTGTTCGG CCCTCTACGTGGGG 
TGGGG CGG CTG CTCTCTGTTCCG CTATGT ACGTTGGG 
G ATGTCCGCCACGCTCTGCTCCG CTCTTTACGTGGGG 
AATGGCAGCTACGGCCTGCTCGGCCTTGTATGTGGGA 
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-GCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCCCCCCCC 
-GCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCCCCCCCC 
-GCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTGCAGCCTCCAGGACCCCCCC 
-GCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTGCAGCCTCCAGGACCCCCCC 
-GCGGAAAGCGCCTAGCCATGGCGTTAGTACGAGTGTCGTGCAGCCTCCAGGACCCCCCC 

- GC AG AAAGCGTCTAGCCATGGCG TTAGTATG AGTGTCGTACAG CCTCCAGGCCCCCCCC 
-GCAGAAAGCGTCTAGCCATGGCGTTAGTATGAGTGTCGTACAGCCTCCAGGCCCCCCCC 

- G CAGAAAG CGTCTAGCCATGG CG TTAGTATG AGTGTCGTACAG CCTCCAGGCCCCCCCC 

jfcCAGAAAGCGTCTAG CCATGGCGTTAGTATG AGTGTCGTACAG CCTCCAGGTCCCCCCC 
** ******* ****************** ********* *********** ******* 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCC3GAATTGCCGGGAAGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTACCGGAAAGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCAGGACGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCAGGACGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATCGCTGGGGTGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCGGGAAGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCGGGAAGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCAGGAAGAC 

TCCCGGGAGAGCCATAGTGGTCTGCGGAACCGGTGAGTACACCGGAATTGCCGGGAAGAC 
************************************************ * * *** 

TGGGTCCTTTCTTGGATAAACCCACTCTATGCCCGGTCATTTGGGCGTGCCCCCGCAAGA 

TGGGTCCTTTCTTGGATAAACCCACTCTATGTCCGGTCATTTGGGCACGCCCCCGCAAGA 

CGGGTCCTTTCTTGGATCAACCCGCTCAATGCCTGGAGATTTGGGCGTGCCCCCGCGAGA 

CGGGTCCTTTCTTGGATCAACCCGCTCAATGCCTGGAGATTTGGGCGTGCCCCCGCAAGA 

CGGGTCCTTTCTTGGAGCAACCCGCTCAATACCCAGAAATTTGGGCGTGCCCCCGCGAGA 

TGGGTCCTTTCTTGGATAAACCCACTCTATGCCCGGCCATTTGGGCGTGCCCCCGCAAGA 

TGGGTCCTTTCTTGGATAAACCCACTCTATGCCCGG CCATTTGGG CGTGCCCCCGCAAGA 

TGGGTCCTTTCTTGGATAAACCCACTCTATGCCTGGCCATTTGGG CGTG CCCCCGCAAGA 

TGGGTCCTTTCTTGGATAAACCCACTCTATGCCCGGCCATTTGGGCGTGCCCCCGCAAGA 
*************** ***** *** ** * * ******** ******** *** 

CTGCTAGCCGAGTAG 
CTGCTAGCCGAGTAG 
CTG CTAGCCG AGTAG 
CTGCTAGCCGAGTAG 
TCACTTAGCCGAGTAG 
CTGCTAGCCGAGTAG 
CTGCTAGCCGAGTAG 
CTGCTAGCCGAGTAG 
CTG CTTAGCCGfiGTAG 
************ 
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